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Abstract 

We describe mutual theory refinement , a 
method for refining world models in a reac- 
tive system. The method detects failures, ex- 
plains their causes, and repairs the approxi- 
mate models which caused the failures. Our 
approach focuses on using one approximate 
model to refine another. 

1. Introduction 

The world model guiding a reactive system is always 
approximate. Thus even the most carefully coded sys- 
tem will occasionally fail. Our long-term objective is 
to enable a reactive system to learn - from its fail- 
ures - refinements of its world model. In this paper, 
we describe interim results on an incremental learn- 
ing method, mutual theory refinement. The method 
detects failures, explains their causes, and repairs the 
approximate models which caused the failures. The 
learning is incorporated into a reactive system, the En- 
tropy Reduction Engine (Bresina k Drummond, 1990; 
Drummond, et a/., 1991). Our approach focuses on us- 
ing one approximate model to refine another approxi- 
mate model. It can also determine when the approx- 
imate model is not sufficient to explain the failure, 
and degrade gracefully, resorting to inductive or rote 
learning. This is accomplished by exploiting two com- 
mon features of knowledge-based reactive systems: (i) 
multiple related approximate models whose underlying 
principles overlap, and («) multiple sources oTexperi- 
ence (e.y., planning and reaction) which, when com- 
pared, provide a strong basis for failure detection and 
explanation. 

A knowledge-based reactive system initially has 
models of the world and actions, however approxi- 
mate. If that knowledge exists, why not exploit it for 
learning? Our work follows the key idea: use knowl- 
edge when you can, yet recognize when you cannot 
use it. We are therefore exploring an analytic end of 
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the learning spectrum, while addressing the problem of 
how to detect the limits of the knowledge and fall back 
on inductive methods. Our method builds on earlier 
work in explanation-based learning (EBL) from failure 
for refining approximate theories (Mostow k Bhatna- 
gar, 1987; Tadepalli, 1989; Chien, 1989; Gupta, 1987). 
Most of these methods assume a complete and correct 
theory is available to fix the approximate one. In con- 
trast, we refine one approximate theory with another, 
and recognize the limitations of either. 

We first present some features of our performance 
system, the Entropy Reduction Engine. Next, we de- 
scribe the learning method and illustrate it using the 
NASA Tile World experimental domain. We then com- 
pare and contrast our approach with other work, clos- 
ing with a discussion of our future plans. 

2. Background 

Reactive systems are situated in an environment in 
which they sense and act. Our work is cast within one 
such system, the Entropy Reduction Engine (ERE). 
Unlike systems which consist of hand-coded reactions 
tailored to a particular task (notably Brooks’ (1986) 
suisumpfton architecture) , the ERE architecture uses 
planning and scheduling to automatically synthesize 
reactions appropriate to the given goals and environ- 
ment. Briefly, this synthesis is accomplished as fol- 
lows (Drummond k Bresina, 1990). First, a planning 
component, called the projector , performs chronologi- 
cal search through the space of possible world model 
states to select an operator sequence that satisfies the 
given goal. Then, using goal regression, the opera- 
tor sequence is compiled into Situated Control Rules 
( SCRs ); each rule specifies the appropriate action to 
take in a given situation to satisfy the desired goal. 
These SCRs are used as advice by the execution com- 
ponent, called the reactor. 

The projector uses two approximate models: opera- 
tors and domain constraints (Drummond, 1986). The 
operators model the agent’s actions in terms of pre- 
conditions and effects. The effects can be specified as 


a set of nondeterministic variant outcomes, each asso- 
ciated with the probability that the variant outcome 
will result from execution of the operator. Domain 
constraints model physical laws as sets of facts which 
can never co-occur in a world state; e.g., "the agent 
cannot be in two locations at once”. These two models 
capture some of the same underlying principles from 
different perspectives; hence, each can serve as a basis 
to refine the other. 

Note that, unlike STRIPS operators, ERE operators 
specify only what is added, not what is deleted. The 
facts in the pre-state that should be deleted from the 
post-state are those that contradict the operator’s ef- 
fects. These contradictions are detected using the do- 
main constraints. For example, when the agent moves 
to a new cell, the above domain constra int indica tes 
that the agent cannot also be in the old cell, so that 
(old) fact is deleted. 

The current testbed used in building and testing 
ERE is the NASA TileWorld experimental domain. 
It co nsists of a simulator of a sing le agent In a two- 
dimensional grid of cells, able to move ancF grasp tiles. 

3. Mutual Theory Refinement 

Given the hand-coded operators and domain con- 
straints, the system may occasionally experience a fail- 
ure: the expected outcomes do not match the observed 
outcomes. Let WM be a world model representing as- 
pects of the world. WM is approximate if it is incom- 
plete and/or incorrect in its representation. A predic- 
tion failure in wm for a reactive system is a discrepancy 
between the predicted state resulting from planning in 
wm and the observed post-state resulting from reaction 
in the world. If there is no discrepancy, it is a correct 
prediction . wm' is a refinement of wm with respect to 
a prediction failure if planning with wm' now results 
in a correct prediction. The mutual theory refinement 
problem is: given a set {wm, } of approximate models, 
and a prediction failure, find a refinement {wm(} of 
{wm,}. The mutual theory refinement method follows 
three steps. It detects a prediction failure, then uses 
some models to explain the cause of the failure (due 
to approximations in other models). It then repairs 
the failure producing a refinement {wmJ}, resulting 
in a correct prediction. A model wm becomes increas- 
ing correct (Mitchell, 1990) with respect to the world 
over time if there is a decrease in prediction failures in 
WM. The goal of the learning is to produce increas- 
ingly correct models. 

Given a prediction failure, how does the method de- 
termine which of the approximate models are causing 
the failure, which models can be used to make the re- 
finement, and which refinement can be made? Predic- 
tion failures for ERE may result from one or more of 
the following: missing or extra (over-general or over- 
specific) preconditions; missing or extra (over-specific 


or over-general) variant outcomes; incorrect precondi- 
tions or outcomes; missing or incorrect domain con- 
straints. We have focused thus far only on refinements 
of incomplete, rather than incorrect, models (missing 
preconditions, missing variant outcomes, or missing 
domain constraints). Our method draws on a catalog 
of heuristics which determine, for each type of incom- 
pleteness, what models can be used to make the refine- 
ment, and what refinement can be made. We envision 
the refinement process as incremental and plausible, 
not necessarily correct. All refinements are annotated 
and may be revised if further relevant information be- 
comes available. The degree of confidence in a partic- 
ular refinement depends both on the degree of confi- 
dence in the knowledge used in the refinement, and on 
whether the refinement is inductive or analytic. 

4. Operator Refinement 

In this section we describe met hods for detecting and 
repairing missing preconditions and missing variant 
outcomes, ill ustrate d with a simple Tile W orld § xam- 
pfe. For these types of Incompletenesses in the approx- 
imate operator model, the method attempts to use the 
approximate model of domain constraints as a basis 
for refinement. The initial recommended repair is to 
use an explanation of the failure derived from the do- 
main constraints to add missing preconditions. If the 
domain constraints are insufficient to explain the fail- 
ure, then the recommended repair is to use inductive 
methods to add a missing variant outcome. 

4.1 Missing Preconditions 

Consider a MOVE operator which describes the agent 
moving in some direction while grasping a tile in some 
other direction. Suppose that the preconditions test 
whether the destination cell of the agent is empty, but 
not whether the destination cell of the grasped tile is 
empty (*.e., a "cell-empty” precondition is missing). 1 
The operator’s single variant outcome specifies that 
both the agent and the grasped tile will end up in their 
destination cells. The initial faulty operator definition 
is the following: 

(defop : name MOVE(?dir) 

: preconditions 

(agent-location (?x ?y)) 

(graspin g ?dir 2 ?t) 

(tile-location ?t (?x2 ?y2)) 
(cell-adjacent (?x ?y) ?dir (?xi ?yl)) 
(cell-adjacent (?x2 ?y2) ?dir2 (?x3 ?y3)) 
(cell-empty (?xi ?ylj) 

.'variant outcome :name straight :prob 1.0 
(agent-location (?xl ?yl)) 

(tile-location ?t (?x3 ?y3)) 

1 (cell-empty (?x ?y)) 

(cell-empty (?x2 ?y2)) 


Similar errors of omission have actually occured. 
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Figure 1: Failure while moving a tile 

The first step detects failure when the observed 
post-state differs from all predicted states (one for each 
variant outcome in the operator). For example, sup- 
pose the agent is attempting to move to the right while 
grasping a square above, but there is a triangle next to 
the square (see Figure 1). Since move’s preconditions 
do not require the destination cell of the grasped tile 
to be empty, projection predicts that the agent and 
square will move right. However, when the reactor at- 
tempts to execute the move, it is prevented from doing 
so by the physics of the TileWorld simulator. Thus, 
the agent and square remain where they were in the 
previous state. Hence, the predicted post-state of the 
move operator differs from the observed post-state. 

The next step explains the difference between the 
observed and predicted states. A possible cause for 
this discrepancy is that the predicted state is incon- 
sistent and, hence, can never be observed. Therefore, 
for each variant outcome of the operator, the result- 
ing predicted state is tested for inconsistencies using 
the domain constraints. In our example, the square 
is predicted to move to the cell that is occupied by 
the triangle (see Figure 2). Hence, the predicted state 
violates the constraint that “no two distinct tiles can 
be in the same cell”. This constitutes a single-step 
explanation of the failure. 



The final step repairs the operator by adding miss- 
ing precondition (s) that will prevent predicting this 
and similar inconsistent states; although the projected 
outcomes of the revised operator might still violate 
other domain constraints. This assumes that the do- 
main constraint theory is sufficient to explain why a 
predicted state is inconsistent and, therefore, why the 
action outcome during reaction was unexpected. 

That is, given a constraint C which the predicted 
outcome state violates, the faulty operator’s precon- 
ditions are restricted to prevent the projection of not 
only this particular outcome state, but of any outcome 
state that could violate C. Goal regression (Nilsson, 


1980) is used to accomplish this general repair. Re- 
gressing a goal over a single operator produces the 
weakest (i.c., most general) preconditions that must 
hold in a state such that executing the operator sat- 
isfies the goal. In this case, we want to ensure that 
the execution of the operator results in a state that 
satisfies the operator’s effects and does not violate C. 
Hence, the “goal” to regress is the operator’s outcome 
restricted to prevent violation of C. The conditions 
resulting from regression are the new preconditions, 
which are a superset of the original preconditions. 

In our example, the goal to regress is determined 
by restricting the outcomes of the move operator to 
prevent violation of “no two distinct tiles can be in 
the same cell”. Since one of the effects of move is 
that “tile t is in cell (x 3 ,y 3 )”, preventing violation of 
the domain constraint requires that for all tiles 1 2 , ei- 
ther “tile $2 is n °t i n ce ll ( x 3i yz) or l an< i *2 are n °t 
distinct tiles”. This restriction can be expressed as 
“no tile other than t is in cell (x 3 , jfc)” (this transfor- 
mation is currently hand-coded). Hence, the goal to 
regress consists of the effects of move plus this addi- 
tional restriction. The result of regression consists of 
this restriction plus the original preconditions (with 
appropriate variable bindings). Thus, the repair step 
“compiles” aspects of the domain constraints into the 
operators, introducing new terms such as “no other 
tile”. The operator definition is, thus, repaired by 
adding the new precondition: “no tile other than t 
is in cell (x 3 , jfc)”. 2 

4*2 Missing Variant Outcome 

In the move operator, the predicted outcome is that 
the agent will move straight in the intended direction 
to the adjacent cell. However, the TileWorld simulator 
will occasionally cause the agent to “veer” so that it 
ends up in a cell to the right or left of the intended 
destination. This variant of the outcome is missing 
from the initial operator definition. 


— 

— 


■ 

" 

▲ 






X 


Figure 3: Missing Variant Outcome: Veer Left 

As above, the first step detects when the observed 
post-state differs from all predicted states. That is, 
the operator has a different effect on the world than is 
expected. In our example, when the agent was in cell 
(1,0), a ‘move east’ resulted in the agent veering left 
and ending up in cell (2, 1) instead of moving straight 

2 Note, this added precondition is slightly more specific 
than the desired “cell-empty” precondition. 



to cell (2, 0) as predicted (see Figure 3). For this type 
of prediction failure, the domain constraint theory can- 
not explain the discrepancy between prediction and 
observation as an inconsistency in projection. In our 
example, there is nothing inconsistent about predict- 
ing the agent will move straight. 

For this case, the recommended repair is to just 
add the observed state as a new expectation for the 
future, that is, as an additional variant outcome of the 
operator. The new variant outcome is computed as the 
difference between the observed post-state and the pre- 
state. This assumes that all observed changes can be 
attributed to the previous agent action (rather than 
to some exogenous event). The new variant is fully 
instantiated since there is no theory that supports de- 
ductive generalization of the observed instance. We 
intend to use inductive learning to generalize over a 
set of new variant outcomes. Currently, the instanti- 
ated variant outcome is included in the operator defi- 
nition with an arbitrarily low probability, and pairs of 
pre- and post- state instances are retained for induc- 
tion over future observations. In our example, the new 
variant outcome is simply “agent in cell (2, 1)”. 

5* Learning Domain Constraints 

In our first two cases, the approximate operator model 
was refined using the domain constraint model. To 
demonstrate the mutuality of the theory refinement, 
we sketch how the the domain constraint model could 
in turn be refined using the operator model. In partic- 
ular, we are working toward a method to address the 
problem of missing domain constraints which perform 
‘deletes’ in projection. If such a domain constraint is 
missing, the predicted state could differ from the ob- 
served state. Suppose the (previously mentioned) do- 
main constraint “the agent cannot be in two places at 
once” is missing. Then, during projection of the MOVE 
operator, the assertion regarding the previous location 
of the agent will not be deleted. Hence, in the pre- 
dicted post-state the agent is both in its old and new 
locations. This differs from the post-state observed 
during reaction, in which the agent is in its new lo- 
cation only. Since the prediction cannot be explained 
as inconsistent, yet is a superset of the observed state, 
this guides the method to check whether the superflu- 
ous predictions match preconditions. If so, the failure 
may be that certain preconditions were not deleted in 
projection because of a missing domain constraint. 

A new domain constraint is plausibly derived based 
on the approximate operator model. The domain con- 
straint should describe those superfluous predictions 
which match preconditions, yet are not in the observed 
outcomes. These need to be deleted during projection. 
The result is a new domain constraint which specifies 
that those preconditions and all the outcomes of the 
operator cannot co-occur. In this example, the new 


domain constraint, derived from the move operator 
states that the precondition “agent at old location” 
and outcomes, including “agent at new location”, can- 
not co-occur. Thus a specialization of the missing do- 
main constraint “the agent cannot be in two locations 
at once” can be learned using the operators. 

6. Related and Future Work 

There are many approaches to theory refinement. Tra- 
ditional approaches to refinement of knowledge-bases 
in expert systems (e.p., Politakis iz Weiss, 1984) typ- 
ically perform induction over cases. Purely inductive 
approaches to refining models for reactive systems in- 
clude reinforcement learning (Lin, 1990) and experi- 
mentation (Gil, 1991; Christiansen, et a/., 1990). 

Analytic, or explanation-based learning approaches 
to refining approximate theories fall into four broad 
categories: (i) using a complete and correct auxiliary 
theory to refine the approximate one, as in most EBL 
from failure (Hammond, 1986; Mostow L Bhatnagar, 
1987; Chien, 1989; Gupta, 1987; Tadepalli, 1989); 
(i«) relying on induction and possibly experimentation 
when the explanation is insufficient (Rajamoney L De- 
Jong, 1988; Ourston & Mooney, 1990; Pazzani, 1988; 
Danyluk, 1989; Ali, 1989); (in) augmenting the sys- 
tem’s knowledge through apprenticeship learning from 
the user (Wilkins, 1988; Smith, et a/., 1985; Laird, et 
a/., 1990); (iv) using one approximate theory to refine 
another. We distinguish our work from the above ap- 
proaches to refining approximate theories in that we 
use one approximate theory to refine another (Bennett 
(1990) is a closely related approach). 

In order to circumscribe the problem initially, we 
have made a number of assumptions, and did not ad- 
dress certain issues. One major assumption underlying 
the current work is that the approximations are due to 
incompleteness rather than incorrectness. Another as- 
sumption is that there is a single recommended refine- 
ment. These need to be removed in the future. Other 
future issues are: (») utility of the refinement - being 
selective as to which new information to retain and 
which to “forget”; (ii) consistency maintenance - co- 
ordinating the refinements in several models to reflect 
each other correctly as new refinements are made; (in') 
dependency maintenance - retaining the appropriate 
justifications to retract earlier faulty decisions, as in 
^Smith, et a/., 1985); (iv) eager versus lazy refinement 
- trading offline processing of errors versus waiting un- 
til they are observed through failure; ( v ) deciding what 
to sense in the world for failure detection; (t/i) refine- 
ment based on partial observations. 

In conclu sio n, we have discussed three cases of mu- 
tual theory refinement ( the first two of which have 
been implemented). In the first, the oper ator mode l 
was refined with the aid of the approximate domain 
constraint model. In the second, inductive learning 


was used because the domain constraint model was 
deemed insufficient to refine the operators analytically. 
And in the third, the domain constraint model was re- 
fined using the approximate operator model. These 
methods begin to pave the way for more robust reac- 
tive systems, better able to learn from their failures 
and refine their models with experience. 
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